{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "72066422-7d11-4f63-9113-17b60626842b",
   "metadata": {},
   "source": [
    "# Lesson 11: Image Captioning"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66adfac7",
   "metadata": {},
   "source": [
    "- In the classroom, the libraries are already installed for you.\n",
    "- If you would like to run this code on your own machine, you can install the following:\n",
    "\n",
    "```\n",
    "    !pip install transformers\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2ffb83a-cd74-472c-ab75-be74944bdfd0",
   "metadata": {},
   "source": [
    "- Here is some code that suppresses warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5521ba88-d370-4c32-933d-455b2cbca604",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "from transformers.utils import logging\n",
    "logging.set_verbosity_error()\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\", message=\"Using the model-agnostic default `max_length`\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6ceb84a-5362-4b12-a253-354fef68fd64",
   "metadata": {},
   "source": [
    "- Load the Model and the Processor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "555e6e8f-1586-4727-a3d7-9046cb3ff438",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from transformers import BlipForConditionalGeneration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b6b34e0-4e0a-4ba6-8adc-bed69968c2fc",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "model = BlipForConditionalGeneration.from_pretrained(\n",
    "    \"./models/Salesforce/blip-image-captioning-base\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f22c659",
   "metadata": {},
   "source": [
    "Info about [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41fb7bdb-88a0-4c2a-9d78-b6b8d0c72c26",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from transformers import AutoProcessor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a98b435c-dc46-4668-9cb6-f47eb5636412",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "processor = AutoProcessor.from_pretrained(\n",
    "    \"./models/Salesforce/blip-image-captioning-base\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f06b43b7-2b5e-4f03-90af-c124c3070f00",
   "metadata": {},
   "source": [
    "- Load the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f10ea0ac-d067-45ad-a5e5-8d21b907b910",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "529cd3a3-4e00-46a2-8e41-432a7276cb39",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "image = Image.open(\"./beach.jpeg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31e94a9a-d27d-405f-ada5-045ad9581c31",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35906e2d-6ccb-485b-91e5-efd5fabc63a8",
   "metadata": {},
   "source": [
    "### Conditional Image Captioning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2238435a-af4e-440a-9001-4a2ea01d6595",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "text = \"a photograph of\"\n",
    "inputs = processor(image, text, return_tensors=\"pt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53b179ea-c5d4-4cf4-b4db-a806c052a23a",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "inputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a124287-b2fe-46bc-83cb-85354f7b2da1",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "out = model.generate(**inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3f16848-ee99-498a-9972-0168bbfa3abc",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50465206-d384-4a8e-b154-37a002cfa331",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "print(processor.decode(out[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39d0e425-d5c0-4968-88b5-b47c2307b7fe",
   "metadata": {},
   "source": [
    "### Unconditional Image Captioning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42b5ed22-9beb-4edc-b99d-3df8a5568d05",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "inputs = processor(image,return_tensors=\"pt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "547b2ae9-76c6-45f5-b1b1-c2306986e177",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "out = model.generate(**inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abc75041-447b-45c2-95f5-fdd31a6d1e6f",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "print(processor.decode(out[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe21dda9",
   "metadata": {},
   "source": [
    "### Try it yourself! \n",
    "- Try this model with your own images and texts!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
